Skip to content

HDDS-15605. Fix flaky testContainerExclusionWithClosedContainerException#10621

Open
chihsuan wants to merge 1 commit into
apache:masterfrom
chihsuan:HDDS-15605
Open

HDDS-15605. Fix flaky testContainerExclusionWithClosedContainerException#10621
chihsuan wants to merge 1 commit into
apache:masterfrom
chihsuan:HDDS-15605

Conversation

@chihsuan

@chihsuan chihsuan commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

testContainerExclusionWithClosedContainerException intermittently fails at the datanode assertion (Expecting empty but was: [<uuid>(null/null)]).

The test asserts that after a ClosedContainerException only the closed container is excluded. But under the default ALL_COMMITTED watch level, a momentarily-slow follower whose watch-for-commit times out is recorded in the client exclude list — intended slow-node-avoidance behaviour of the configurable watchType (HDDS-2887). So an empty datanode set is not an invariant under ALL_COMMITTED; the assertion predates the watchType config and never accounted for it.

The test's subject is container exclusion, which is independent of the watch level. This removes the non-invariant getDatanodes().isEmpty() assertion (the container and pipeline assertions stay); watch-level datanode exclusion is already covered by testDatanodeExclusionWithMajorityCommit.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15605

How was this patch tested?

intermittent-test-check on the fork, TestFailureHandlingByClient, test-name=ALL, 100 runs each (10 splits x 10 iterations):

  • This branch: the assertion did not recur (0/100). (run)
  • master baseline, same harness: the assertion reproduced in ~5/100 runs (Expecting empty but was: [...]), matching the reported intermittency. (run)
  • Remaining noise, unrelated to this change (present on master too):
    • A TimeoutException in TestHelper.waitForContainerClose (container state transition not completing under the harness's extreme parallel load of 10 concurrent splits); it occurs before the modified assertion and is a pre-existing load-sensitivity of the test.
    • testDatanodeExclusionWithMajorityCommit failures are the known HDDS-13972 (@Flaky).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant